1,189 research outputs found

    Electrical Grid Anomaly Detection via Tensor Decomposition

    Full text link
    Supervisory Control and Data Acquisition (SCADA) systems often serve as the nervous system for substations within power grids. These systems facilitate real-time monitoring, data acquisition, control of equipment, and ensure smooth and efficient operation of the substation and its connected devices. Previous work has shown that dimensionality reduction-based approaches, such as Principal Component Analysis (PCA), can be used for accurate identification of anomalies in SCADA systems. While not specifically applied to SCADA, non-negative matrix factorization (NMF) has shown strong results at detecting anomalies in wireless sensor networks. These unsupervised approaches model the normal or expected behavior and detect the unseen types of attacks or anomalies by identifying the events that deviate from the expected behavior. These approaches; however, do not model the complex and multi-dimensional interactions that are naturally present in SCADA systems. Differently, non-negative tensor decomposition is a powerful unsupervised machine learning (ML) method that can model the complex and multi-faceted activity details of SCADA events. In this work, we novelly apply the tensor decomposition method Canonical Polyadic Alternating Poisson Regression (CP-APR) with a probabilistic framework, which has previously shown state-of-the-art anomaly detection results on cyber network data, to identify anomalies in SCADA systems. We showcase that the use of statistical behavior analysis of SCADA communication with tensor decomposition improves the specificity and accuracy of identifying anomalies in electrical grid systems. In our experiments, we model real-world SCADA system data collected from the electrical grid operated by Los Alamos National Laboratory (LANL) which provides transmission and distribution service through a partnership with Los Alamos County, and detect synthetically generated anomalies.Comment: 8 pages, 2 figures. In IEEE Military Communications Conference, Artificial Intelligence for Cyber Workshop (MILCOM), 202

    COVID-19 Kaggle Literature Organization

    Full text link
    The world has faced the devastating outbreak of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), or COVID-19, in 2020. Research in the subject matter was fast-tracked to such a point that scientists were struggling to keep up with new findings. With this increase in the scientific literature, there arose a need for organizing those documents. We describe an approach to organize and visualize the scientific literature on or related to COVID-19 using machine learning techniques so that papers on similar topics are grouped together. By doing so, the navigation of topics and related papers is simplified. We implemented this approach using the widely recognized CORD-19 dataset to present a publicly available proof of concept.Comment: Maksim Ekin Eren, Nick Solovyev, Edward Raff, Charles Nicholas, and Ben Johnson. 2020. COVID-19 Kaggle Literature Organization. In ACM Sym-posium on Document Engineering 2020 (DocEng 20), September 29-October2, 2020, Virtual Event, CA, USA.ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3395027.341959

    MalwareDNA: Simultaneous Classification of Malware, Malware Families, and Novel Malware

    Full text link
    Malware is one of the most dangerous and costly cyber threats to national security and a crucial factor in modern cyber-space. However, the adoption of machine learning (ML) based solutions against malware threats has been relatively slow. Shortcomings in the existing ML approaches are likely contributing to this problem. The majority of current ML approaches ignore real-world challenges such as the detection of novel malware. In addition, proposed ML approaches are often designed either for malware/benign-ware classification or malware family classification. Here we introduce and showcase preliminary capabilities of a new method that can perform precise identification of novel malware families, while also unifying the capability for malware/benign-ware classification and malware family classification into a single framework.Comment: Accepted at IEEE ISI 202

    Interactive Distillation of Large Single-Topic Corpora of Scientific Papers

    Full text link
    Highly specific datasets of scientific literature are important for both research and education. However, it is difficult to build such datasets at scale. A common approach is to build these datasets reductively by applying topic modeling on an established corpus and selecting specific topics. A more robust but time-consuming approach is to build the dataset constructively in which a subject matter expert (SME) handpicks documents. This method does not scale and is prone to error as the dataset grows. Here we showcase a new tool, based on machine learning, for constructively generating targeted datasets of scientific literature. Given a small initial "core" corpus of papers, we build a citation network of documents. At each step of the citation network, we generate text embeddings and visualize the embeddings through dimensionality reduction. Papers are kept in the dataset if they are "similar" to the core or are otherwise pruned through human-in-the-loop selection. Additional insight into the papers is gained through sub-topic modeling using SeNMFk. We demonstrate our new tool for literature review by applying it to two different fields in machine learning.Comment: Accepted at 2023 IEEE ICMLA conferenc

    Semi-supervised Classification of Malware Families Under Extreme Class Imbalance via Hierarchical Non-Negative Matrix Factorization with Automatic Model Selection

    Full text link
    Identification of the family to which a malware specimen belongs is essential in understanding the behavior of the malware and developing mitigation strategies. Solutions proposed by prior work, however, are often not practicable due to the lack of realistic evaluation factors. These factors include learning under class imbalance, the ability to identify new malware, and the cost of production-quality labeled data. In practice, deployed models face prominent, rare, and new malware families. At the same time, obtaining a large quantity of up-to-date labeled malware for training a model can be expensive. In this paper, we address these problems and propose a novel hierarchical semi-supervised algorithm, which we call the HNMFk Classifier, that can be used in the early stages of the malware family labeling process. Our method is based on non-negative matrix factorization with automatic model selection, that is, with an estimation of the number of clusters. With HNMFk Classifier, we exploit the hierarchical structure of the malware data together with a semi-supervised setup, which enables us to classify malware families under conditions of extreme class imbalance. Our solution can perform abstaining predictions, or rejection option, which yields promising results in the identification of novel malware families and helps with maintaining the performance of the model when a low quantity of labeled data is used. We perform bulk classification of nearly 2,900 both rare and prominent malware families, through static analysis, using nearly 388,000 samples from the EMBER-2018 corpus. In our experiments, we surpass both supervised and semi-supervised baseline models with an F1 score of 0.80.Comment: Accepted at ACM TOP

    Optimasi Portofolio Resiko Menggunakan Model Markowitz MVO Dikaitkan dengan Keterbatasan Manusia dalam Memprediksi Masa Depan dalam Perspektif Al-Qur`an

    Full text link
    Risk portfolio on modern finance has become increasingly technical, requiring the use of sophisticated mathematical tools in both research and practice. Since companies cannot insure themselves completely against risk, as human incompetence in predicting the future precisely that written in Al-Quran surah Luqman verse 34, they have to manage it to yield an optimal portfolio. The objective here is to minimize the variance among all portfolios, or alternatively, to maximize expected return among all portfolios that has at least a certain expected return. Furthermore, this study focuses on optimizing risk portfolio so called Markowitz MVO (Mean-Variance Optimization). Some theoretical frameworks for analysis are arithmetic mean, geometric mean, variance, covariance, linear programming, and quadratic programming. Moreover, finding a minimum variance portfolio produces a convex quadratic programming, that is minimizing the objective function ðð¥with constraintsð ð 𥠥 ðandð´ð¥ = ð. The outcome of this research is the solution of optimal risk portofolio in some investments that could be finished smoothly using MATLAB R2007b software together with its graphic analysis

    Measurement of the top quark forward-backward production asymmetry and the anomalous chromoelectric and chromomagnetic moments in pp collisions at √s = 13 TeV

    Get PDF
    Abstract The parton-level top quark (t) forward-backward asymmetry and the anomalous chromoelectric (d̂ t) and chromomagnetic (μ̂ t) moments have been measured using LHC pp collisions at a center-of-mass energy of 13 TeV, collected in the CMS detector in a data sample corresponding to an integrated luminosity of 35.9 fb−1. The linearized variable AFB(1) is used to approximate the asymmetry. Candidate t t ¯ events decaying to a muon or electron and jets in final states with low and high Lorentz boosts are selected and reconstructed using a fit of the kinematic distributions of the decay products to those expected for t t ¯ final states. The values found for the parameters are AFB(1)=0.048−0.087+0.095(stat)−0.029+0.020(syst),μ̂t=−0.024−0.009+0.013(stat)−0.011+0.016(syst), and a limit is placed on the magnitude of | d̂ t| < 0.03 at 95% confidence level. [Figure not available: see fulltext.

    Search for Physics beyond the Standard Model in Events with Overlapping Photons and Jets

    Get PDF
    Results are reported from a search for new particles that decay into a photon and two gluons, in events with jets. Novel jet substructure techniques are developed that allow photons to be identified in an environment densely populated with hadrons. The analyzed proton-proton collision data were collected by the CMS experiment at the LHC, in 2016 at root s = 13 TeV, and correspond to an integrated luminosity of 35.9 fb(-1). The spectra of total transverse hadronic energy of candidate events are examined for deviations from the standard model predictions. No statistically significant excess is observed over the expected background. The first cross section limits on new physics processes resulting in such events are set. The results are interpreted as upper limits on the rate of gluino pair production, utilizing a simplified stealth supersymmetry model. The excluded gluino masses extend up to 1.7 TeV, for a neutralino mass of 200 GeV and exceed previous mass constraints set by analyses targeting events with isolated photons.Peer reviewe

    Measurement of b jet shapes in proton-proton collisions at root s=5.02 TeV

    Get PDF
    We present the first study of charged-hadron production associated with jets originating from b quarks in proton-proton collisions at a center-of-mass energy of 5.02 TeV. The data sample used in this study was collected with the CMS detector at the CERN LHC and corresponds to an integrated luminosity of 27.4 pb(-1). To characterize the jet substructure, the differential jet shapes, defined as the normalized transverse momentum distribution of charged hadrons as a function of angular distance from the jet axis, are measured for b jets. In addition to the jet shapes, the per-jet yields of charged particles associated with b jets are also quantified, again as a function of the angular distance with respect to the jet axis. Extracted jet shape and particle yield distributions for b jets are compared with results for inclusive jets, as well as with the predictions from the pythia and herwig++ event generators.Peer reviewe
    corecore